27 research outputs found

    Graph-based keyword spotting in historical handwritten documents

    Get PDF
    The amount of handwritten documents that is digitally available is rapidly increasing. However, we observe a certain lack of accessibility to these documents especially with respect to searching and browsing. This paper aims at closing this gap by means of a novel method for keyword spotting in ancient handwritten documents. The proposed system relies on a keypoint-based graph representation for individual words. Keypoints are characteristic points in a word image that are represented by nodes, while edges are employed to represent strokes between two keypoints. The basic task of keyword spotting is then conducted by a recent approximation algorithm for graph edit distance. The novel framework for graph-based keyword spotting is tested on the George Washington dataset on which a state-of-the-art reference system is clearly outperformed.Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR). S+SSPR 2016: Structural, Syntactic, and Statistical Pattern Recognition pp. 564-573.http://link.springer.combookseries/5582017-11-05hj2017Informatic

    �ber die elastische Hysterese

    No full text

    BLSTM Neural Network Based Word Retrieval for Hindi Documents

    No full text
    Retrieval from Hindi document image collections is a challenging task. This is partly due to the complexity of the script, which has more than 800 unique ligatures. In addition, segmentation and recognition of individual characters often becomes difficult due to the writing style as well as degradations in the print. For these reasons, robust OCRs are non existent for Hindi. Therefore, Hindi document repositories are not amenable to indexing and retrieval. In this paper, we propose a scheme for retrieving relevant Hindi documents in response to a query word. This approach uses BLSTM neural networks. Designed to take contextual information into account, these networks can handle word images that can not be robustly segmented into individual characters. By zoning the Hindi words, we simplify the problem and obtain high retrieval rates. Our simplification suits the retrieval problem, while it does not apply to recognition. Our scalable retrieval scheme avoids explicit recognition of characters. An experimental evaluation on a dataset of word images gathered from two complete books demonstrates good accuracy even in the presence of printing variations and degradations. The performance is compared with baseline methods

    Similarity-based regularization for semi-supervised learning for handwritten digit recognition

    No full text
    This paper presents an experimental analysis on the use of semi-supervised learning in the handwritten digit recognition field. More specifically, two new feedback-based techniques for retraining individual classifiers in a multi-expert scenario are discussed. These new methods analyze the final decision provided by the multi-expert system so that sample classified with a confidence greater than a specific threshold is used to update the system itself. Experimental results carried out on the CEDAR (handwritten digits) database are presented. In particular, error rate, similarity index and a new correlation score among them are considered in order to evaluate the best retraining rule. For the experimental evaluation, an SVM classifier and five different combination techniques at abstract and measurement level have been used. Finally, the results show that iterating the feedback process, on different multi-expert systems built with the five combination techniques, one retraining rule is winning over the other respect to the best correlation score

    A New Smoothing Method for Lexicon-Based Handwritten Text Keyword Spotting

    No full text

    Improved BLSTM Neural Networks for Recognition of On-Line Bangla Complex Words

    No full text

    Neural network language models for off-line handwriting recognition

    Full text link
    [EN] Unconstrained off-line continuous handwritten text recognition is a very challenging task which has been recently addressed by different promising techniques. This work presents our latest contribution to this task, integrating neural network language models in the decoding process of three state-of-the-art systems: one based on bidirectional recurrent neural networks, another based on hybrid hidden Markov models and, finally, a combination of both. Experimental results obtained on the IAM off-line database demonstrate that consistent word error rate reductions can be achieved with neural network language models when compared with statistical N-gram language models on the three tested systems. The best word error rate, 16.1%, reported with ROVER combination of systems using neural network language models significantly outperforms current benchmark results for the IAM database.The authors wish to acknowledge the anonymous reviewers for their detailed and helpful comments to the paper. We also thank Alex Graves for kindly providing us with the BLSTM Neural Network source code. This work has been supported by the European project FP7-PEOPLE-2008-IAPP: 230653, the Spanish Government under project TIN2010-18958, as well as by the Swiss National Science Foundation (Project CRSI22_125220).Zamora Martínez, FJ.; Frinken, V.; España Boquera, S.; Castro-Bleda, MJ.; Fischer, A.; Bunke, H. (2014). Neural network language models for off-line handwriting recognition. Pattern Recognition. 47(4):1642-1652. https://doi.org/10.1016/j.patcog.2013.10.020S1642165247
    corecore